Syntactic annotation of spoken utterances: A case study on the Czech Academic Corpus

نویسندگان

  • Barbora Hladká
  • Zdenka Uresová
چکیده

Corpus annotation plays an important role in linguistic analysis and computational processing of both written and spoken language. Syntactic annotation of spoken texts becomes clearly a topic of considerable interest nowadays, driven by the desire to improve automatic speech recognition systems by incorporating syntax in the language models, or to build language understanding applications. Syntactic annotation of both written and spoken texts in the Czech Academic Corpus was created thirty years ago when no other (even annotated) corpus of spoken texts has existed. We will discuss how much relevant and inspiring this annotation is to the current frameworks of spoken text annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Feature of EFL Speakers’ Conference Presentations: The Case of Passive Voice and Pseudo-Cleft

Acquiring proficiency in academic genres is a key factor in research community. Among various genres in academic discourse communities, spoken genre, especially Conference Presentations (CPs), play a crucial role in research communities, though investigation on this important genre is in its infancy or is relatively under-researched. Therefore, the present study aims to shed light on the import...

متن کامل

Prague Dependency Treebank Annotation Errors: A Preliminary Analysis

This paper presents a basic analysis of syntactic annotation errors and inconsistencies in the Prague Dependency Treebank, the biggest corpus of Czech with manual syntactic annotation. The corpus is used for developing and testing of many syntactic analysers of Czech and the problems in the annotation have an essential impact on the evaluation of the quality of these parsers and the results of ...

متن کامل

Oral2008: New Balanced Corpus of Spoken Czech 1

Attention paid to spoken language has increased in the last decades, as well as its importance for linguistic research and natural language processing in general. However, compilation of spoken corpora as an indispensable source of data is very laborious and thus expensive. Nevertheless, more and more spoken corpora are being created currently. There are various approaches to their design, dept...

متن کامل

Spoken Requests for Tourist Information: a Speech Acts Annotation

This paper presents an ongoing corpus annotation of speech acts in the domain of tourism, which falls within a wider project on multimodal question answering. An annotation scheme and set of guidelines are developed to mark information about parts of spoken utterances which require a response, distinguishing them from parts of utterances which do not. The corpus used for annotation consists of ...

متن کامل

Spanish Phoneme Classification by Means of a Hierarchy of Kohonen Self-Organizing Maps

Research Issues for the Next Generation Spoken Dialogue Systems p. 1 Data-Driven Analysis of Speech p. 10 Towards a Road Map for Machine Translation Research p. 19 The Prague Dependency Treebank: Crossing the Sentence Boundary p. 20 Text Tiered Tagging and Combined Language Models Classifiers p. 28 Syntactic Tagging p. 34 Information, Language, Corpus and Linguistics p. 39 Prague Dependency Tre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009